Effect of utilizing terminology on extraction of protein-protein interaction information from biomedical literature
نویسندگان
چکیده
As the amount of on-line scientific literature in the biomedical domain increases, automatic processing has become a promising approach for accelerating research. We are applying syntactic parsing trained on the general domain to identify proteinprotein interactions. One of the main difficulties obstructing the use of language processing is the prevalence of specialized terminology. Accordingly, we have created a specialized dictionary by compiling on-line glossaries, and have applied it for information extraction. We conducted preliminary experiments on one hundred sentences, and compared the extraction performance when (a) using only a general dictionary and (b) using this plus our specialized dictionary. Contrary to our expectation, using only the general dictionary resulted in better performance (recall 93.0%, precision 91.0%) than with the terminology-based approach (recall 92.9%, precision 89.6%).
منابع مشابه
BioPPIExtractor: A protein-protein interaction extraction system for biomedical literature
Automatic extracting protein–protein interaction information from biomedical literature can help to build protein relation network, predict protein function and design new drugs. This paper presents a protein–protein interaction extraction system BioPPIExtractor for biomedical literature. This system applies Conditional Random Fields model to tag protein names in biomedical text, then uses a li...
متن کاملExtraction of Drug-Drug Interaction from Literature through Detecting Linguistic-based Negation and Clause Dependency
Extracting biomedical relations such as drug-drug interaction (DDI) from text is an important task in biomedical NLP. Due to the large number of complex sentences in biomedical literature, researchers have employed some sentence simplification techniques to improve the performance of the relation extraction methods. However, due to difficulty of the task, there is no noteworthy improvement in t...
متن کاملA Tree Kernel-Based Method for Protein-Protein Interaction Mining from Biomedical Literature
As genomic research advances, the knowledge discovery from a large collection of scientific papers becomes more important for efficient biological and biomedical research. Even though current databases continue to update new protein-protein interactions, valuable information still remains in biomedical literature. Thus data mining techniques are required to extract the information. In this pape...
متن کاملPIE the search: searching PubMed literature for protein interaction information
MOTIVATION Finding protein-protein interaction (PPI) information from literature is challenging but an important issue. However, keyword search in PubMed(®) is often time consuming because it requires a series of actions that refine keywords and browse search results until it reaches a goal. Due to the rapid growth of biomedical literature, it has become more difficult for biologists and curato...
متن کاملDependency-directed Tree Kernel-based Protein-Protein Interaction Extraction from Biomedical Literature
There is a surge of research interest in protein-protein interaction (PPI) extraction from biomedical literature. While most of the state-of-the-art PPI extraction systems focus on dependency-based structured information, the rich structured information inherent in constituent parse trees has not been extensively explored for PPI extraction. In this paper, we propose a novel approach to tree ke...
متن کامل